The Efficiency of Corpus-based Distributional Models for Literature-based Discovery on Large Data Sets
نویسندگان
چکیده
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson’s serendipitous discovery of the therapeutic effects of fish oil on Raynaud’s disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literaturebased discovery tasks.
منابع مشابه
Providing a New Model to Improving DEA-based Models in Multi-criteria Inventory Classification (Case Study: Pars Khazar)
Abstract Objective: Many organizations use the ABC classification method to control their large amount of inventories. The most common way to classify inventories is the ABC method. In traditional ABC classification, items are only classified according to one criteria. But there are other criteria that need to be considered in the inventory classification. The purpose of this study is to prese...
متن کاملWeighted-HR: An Improved Hierarchical Grid Resource Discovery
Grid computing environments include heterogeneous resources shared by a large number of computers to handle the data and process intensive applications. In these environments, the required resources must be accessible for Grid applications on demand, which makes the resource discovery as a critical service. In recent years, various techniques are proposed to index and discover the Grid resource...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملIndeterminacy, Discovery and Polyphony in Houshang Golshiri's Short Stories
Houshang Golshiri is among the Iranian leading creative and imaginative fiction writers who managed to open up new horizons in Iranian fiction. Hence he could be claimed to be an innovative avant-garde short story writer with unique stylistic characteristics. Although inspired by fiction writers such as Alavi, Sadeqi, Golestan and Sa'edi in the techniques of narration, Golshiri nonetheless stan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014